192-2007: Latent Class Analysis in SAS®—Promise, Problems, and Programming
نویسنده
چکیده
Latent class analysis (LCA) is an important tool for marketing professionals who must characterize subgroups within large and heterogeneous populations. LCA is also of interest to clinical professionals who must place clients in diagnostic or prognostic categories when a gold standard for doing so is poorly defined. Attempts to bring LCA into the SAS® mainstream are fairly recent. The paper discusses these efforts and demonstrates a SAS macro that combines PROC CATMOD with conventional DATA steps to perform LCA. The macro is demonstrated on data wherein four binary observed variables permit estimation of two hypothesized latent classes. LCA is a categorical analog to factor analysis, and posits the existence of unobserved classes to explain the pattern of association observed in a multidimensional contingency table. LCA estimates two types of parameters: (1) latent class prevalences and (2) probabilities, conditional on class membership, of individuals' responses on each observed variable. The SAS macro estimates these parameters using a classic expectation-maximization (E-M) algorithm. Maximization steps specify a log-linear model in PROC CATMOD while expectation steps employ standard data step programming. The presentation illustrates the usefulness of LCA and probes certain problems and limitations associated with constructing and interpreting LC models. LC parameter estimates are sensitive to their initial values, and the classic E-M approach does not estimate standard errors. Bootstrapping of standard errors and replicated analyses using a grid of initial estimates are among the approaches that can address these limitations. Search keywords: categorical data; diagnosis INTRODUCTION Latent class analysis (LCA) is a categorical analog to factor analysis. Factor analysis defines unobserved factors to which to attribute the complex covariance structure of a multivariable sample. Similarly, LCA posits unobserved (latent) classes to explain complex associations in a multi-dimensional contingency table. The set of observed categorical variables are typically called "manifest indicators." LCA estimates two types of population parameters: (1) the prevalence of each latent class, the number of which the analyst must specify a priori; (2) the probabilities, conditional on latent class membership, that an individual demonstrates a specific response to an observed variable. Its ability to detect an unobserved categorical structure makes LCA an important tool for marketing professionals who seek to identify subgroups within large and heterogenous populations. LCA is equally useful for clinical professionals who must place clients in diagnostic or prognostic categories when a gold standard for doing so is poorly defined. Latent class analysis is unavailable in SAS. Investigators who wish to use SAS to perform latent class analysis must author algorithms in SAS' matrix language, PROC IML, or learn lesser used procedures. IML modules that perform latent class analysis include one by the author (Thompson, 2003) and latent class regression macros developed at the Johns Hopkins School of Public Health (Bandeen-Roche, Miglioretti, Zeger, & Rathouz, 1997). A group at Penn State’s Methodology Center has produced a beta version of a module they call PROC LCA (Methodology Center, 2007). Other researchers have applied latent class models to assess diagnostic accuracy by using SAS PROC NLIN, which performs nonlinear regression using weighted least squares estimation (Engels, Sinclair, Biggar, Whitby, & Goedert, et al., 2000; Blick & Hagen, 2002). This paper illustrates an approach to LCA that relies on IML, and another that attempts to use conventional PROC and DATA steps. The programs currently accommodate information on up to four binary manifest variables to estimate the latent structure of a single unobserved variable with two hypothesized classes. Stouffer and Toby (1951) published a contingency table containing data on 216 observations that others have used (Goodman, 1974, 2002; McCutcheon, 1987, Table 1.2, p.10) to illustrate latent class analysis. The original research explored undergraduate students' responses to four stories that forced them to make ethical decisions when confronted with conflict between their roles as friends and their roles as members of larger social groups. The four 1 SAS Global Forum 2007 Statistics and Data Analysis
منابع مشابه
Clustering and combining pattern of metabolic syndrome components among Iranian population with latent class analysis
Background: Metabolic syndrome (MetS), a combination of coronary heart disease and diabetes mellitus risk factor, refer to one of the most challenging public health issues in worldwide. The aim of this study was to identify the subgroups of participants in a study on the basis of MetS components. Methods: The cross-sectional study took place in the districts related to Teh...
متن کاملLatent Class Analysis of the cardiometabolic risk factors in children and adolescents: the CASPIAN-V study
Background: Cardio-metabolic syndrome indicates the clustering of several risk factors. The aims of this study were to identify the subgroups of the Iranian children and adolescents on the basis of the components of the cardio-metabolic syndrome and assess the role of demographic characteristics, socioeconomic status and life style related behaviors on the membership of participants in each lat...
متن کاملAn application of Measurement error evaluation using latent class analysis
Latent class analysis (LCA) is a method of evaluating non sampling errors, especially measurement error in categorical data. Biemer (2011) introduced four latent class modeling approaches: probability model parameterization, log linear model, modified path model, and graphical model using path diagrams. These models are interchangeable. Latent class probability models express l...
متن کاملA generalized implicit enumeration algorithm for a class of integer nonlinear programming problems
Presented here is a generalization of the implicit enumeration algorithm that can be applied when the objec-tive function is being maximized and can be rewritten as the difference of two non-decreasing functions. Also developed is a computational algorithm, named linear speedup, to use whatever explicit linear constraints are present to speedup the search for a solution. The method is easy to u...
متن کاملOrdered weighted averaging in SAS: A MCDM application
This paper explores the use of the optimization procedures in SAS/OR software with application to the ordered weighted averaging (OWA) operators of decision-making. OWA was originally introduced by Yager (1988) has gained much interest among researchers, hence many applications in the areas of decision making, expert systems, data mining, approximate reasoning, fuzzy system and control have bee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007